VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia
Identifieur interne : 000404 ( Main/Exploration ); précédent : 000403; suivant : 000405VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia
Auteurs : Aastha Madaan [Japon] ; Wanming Chu [Japon] ; Subhash Bhalla [Japon]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2011.
Abstract
Abstract: World Wide Web has become the largest source of information. Consequently web based information retrieval, information extraction; automatic page adaptation and querying deep-web are gaining importance. The need for information retrieval applications is increasing. To address the problems of the ever expanding information over the internet, traditional information retrieval techniques have been applied. Such techniques are sometimes time consuming, and laborious, and the results obtained may be unsatisfactory. This study is an attempt to query web pages like MedlinePlus medical encyclopedia by segmenting the web pages. It summarizes the existing approaches for web page segmentation from the perspective of “structure realization for improved querying” on the web. It proposes a new algorithm VisHue for web page segmentation based on visual cues and heuristics and further uses the hierarchical structure generated by it to develop the Query by Segment or Tag (QBT) query interface. This interface is close to the end-user and exploits the relationships among the various content groups within a web page. Such an improved query-interface enables the user to perform in-depth querying. It is a step beyond the page-level search.
Url:
DOI: 10.1007/978-3-642-25731-5_9
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000977
- to stream Istex, to step Curation: 000966
- to stream Istex, to step Checkpoint: 000060
- to stream Main, to step Merge: 000409
- to stream Main, to step Curation: 000404
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia</title>
<author><name sortKey="Madaan, Aastha" sort="Madaan, Aastha" uniqKey="Madaan A" first="Aastha" last="Madaan">Aastha Madaan</name>
</author>
<author><name sortKey="Chu, Wanming" sort="Chu, Wanming" uniqKey="Chu W" first="Wanming" last="Chu">Wanming Chu</name>
</author>
<author><name sortKey="Bhalla, Subhash" sort="Bhalla, Subhash" uniqKey="Bhalla S" first="Subhash" last="Bhalla">Subhash Bhalla</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:7EF68FF10F53FA37822A74035B17C3BB237AE5C9</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1007/978-3-642-25731-5_9</idno>
<idno type="url">https://api.istex.fr/document/7EF68FF10F53FA37822A74035B17C3BB237AE5C9/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000977</idno>
<idno type="wicri:Area/Istex/Curation">000966</idno>
<idno type="wicri:Area/Istex/Checkpoint">000060</idno>
<idno type="wicri:doubleKey">0302-9743:2011:Madaan A:vishue:web:page</idno>
<idno type="wicri:Area/Main/Merge">000409</idno>
<idno type="wicri:Area/Main/Curation">000404</idno>
<idno type="wicri:Area/Main/Exploration">000404</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia</title>
<author><name sortKey="Madaan, Aastha" sort="Madaan, Aastha" uniqKey="Madaan A" first="Aastha" last="Madaan">Aastha Madaan</name>
<affiliation wicri:level="1"><country xml:lang="fr">Japon</country>
<wicri:regionArea>University of Aizu, 965-8580, Aizu-Wakamatsu Shi, Fukushima-ken</wicri:regionArea>
<wicri:noRegion>Fukushima-ken</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author><name sortKey="Chu, Wanming" sort="Chu, Wanming" uniqKey="Chu W" first="Wanming" last="Chu">Wanming Chu</name>
<affiliation wicri:level="1"><country xml:lang="fr">Japon</country>
<wicri:regionArea>University of Aizu, 965-8580, Aizu-Wakamatsu Shi, Fukushima-ken</wicri:regionArea>
<wicri:noRegion>Fukushima-ken</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author><name sortKey="Bhalla, Subhash" sort="Bhalla, Subhash" uniqKey="Bhalla S" first="Subhash" last="Bhalla">Subhash Bhalla</name>
<affiliation wicri:level="1"><country xml:lang="fr">Japon</country>
<wicri:regionArea>University of Aizu, 965-8580, Aizu-Wakamatsu Shi, Fukushima-ken</wicri:regionArea>
<wicri:noRegion>Fukushima-ken</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Japon</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2011</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">7EF68FF10F53FA37822A74035B17C3BB237AE5C9</idno>
<idno type="DOI">10.1007/978-3-642-25731-5_9</idno>
<idno type="ChapterID">9</idno>
<idno type="ChapterID">Chap9</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: World Wide Web has become the largest source of information. Consequently web based information retrieval, information extraction; automatic page adaptation and querying deep-web are gaining importance. The need for information retrieval applications is increasing. To address the problems of the ever expanding information over the internet, traditional information retrieval techniques have been applied. Such techniques are sometimes time consuming, and laborious, and the results obtained may be unsatisfactory. This study is an attempt to query web pages like MedlinePlus medical encyclopedia by segmenting the web pages. It summarizes the existing approaches for web page segmentation from the perspective of “structure realization for improved querying” on the web. It proposes a new algorithm VisHue for web page segmentation based on visual cues and heuristics and further uses the hierarchical structure generated by it to develop the Query by Segment or Tag (QBT) query interface. This interface is close to the end-user and exploits the relationships among the various content groups within a web page. Such an improved query-interface enables the user to perform in-depth querying. It is a step beyond the page-level search.</div>
</front>
</TEI>
<affiliations><list><country><li>Japon</li>
</country>
</list>
<tree><country name="Japon"><noRegion><name sortKey="Madaan, Aastha" sort="Madaan, Aastha" uniqKey="Madaan A" first="Aastha" last="Madaan">Aastha Madaan</name>
</noRegion>
<name sortKey="Bhalla, Subhash" sort="Bhalla, Subhash" uniqKey="Bhalla S" first="Subhash" last="Bhalla">Subhash Bhalla</name>
<name sortKey="Bhalla, Subhash" sort="Bhalla, Subhash" uniqKey="Bhalla S" first="Subhash" last="Bhalla">Subhash Bhalla</name>
<name sortKey="Chu, Wanming" sort="Chu, Wanming" uniqKey="Chu W" first="Wanming" last="Chu">Wanming Chu</name>
<name sortKey="Chu, Wanming" sort="Chu, Wanming" uniqKey="Chu W" first="Wanming" last="Chu">Wanming Chu</name>
<name sortKey="Madaan, Aastha" sort="Madaan, Aastha" uniqKey="Madaan A" first="Aastha" last="Madaan">Aastha Madaan</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000404 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000404 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:7EF68FF10F53FA37822A74035B17C3BB237AE5C9 |texte= VisHue: Web Page Segmentation for an Improved Query Interface for MedlinePlus Medical Encyclopedia }}
This area was generated with Dilib version V0.6.32. |